-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU accelerated encoder #895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes lost all my comments while reviewing on github.dev.... friggin github. Some graphql internal error or something. So I'll just submit this first set of reviews in case. Don't really feel like reviewing again lol... another reason why small PRs are gud, big PRs bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spent 2 hours reviewing and I barely even got to the important stuff. Suggest no more PRs like this, for the sake of our industry, and of your reviewers' mental health. :D
encoding/icicle/device_setup.go
Outdated
type IcicleDevice struct { | ||
Device icicle_runtime.Device | ||
NttCfg core.NTTConfig[[icicle_bn254.SCALAR_LIMBS]uint32] | ||
MsmCfg core.MSMConfig | ||
FlatFFTPointsT []icicle_bn254.Affine | ||
SRSG1Icicle []icicle_bn254.Affine | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point that we can have a different ntt/msm config and points loaded on different devices (gpus?)? Do we actually make use of this flexibility? If so, please add comment explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there's only going to be a single configuration loaded at startup time. Right now we don't support multiple GPUs for an encoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mentioned anywhere? Would be good to if not.
encoding/icicle/device_setup.go
Outdated
setupErr = fmt.Errorf("could not setup NTT") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just return the setupErr directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also prob want to wrap the icicleErr to help debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot return because I don't think that RunOnDevice func will let me. Looks like this:
func RunOnDevice(device *Device, funcToRun func(args ...any), args ...any) {
go func(deviceToRunOn *Device) {
defer runtime.UnlockOSThread()
runtime.LockOSThread()
originalDevice, _ := GetActiveDevice()
SetDevice(deviceToRunOn)
funcToRun(args...)
SetDevice(originalDevice)
}(device)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah my bad I completely missed the function signature.
I see so the code passed to RunOnDevice is run on the gpu...? Did you test that your shared pointer for setupErr is working then? This would mean the gpu is able to go write in the cpu's RAM. Does this only work on mac where cpu/gpu share memory? Or is this something that always works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, RunOnDevice locks the OS thread so it makes all of that code run on the same OS thread. It's because with go routines you can call SetDevice
to set the GPU device and that can run on thread 1 but the go routine can end up switching to thread 2 in the middle which means that it will use the default device (CPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a diff-neutral PR just for refactoring? It seems there are code here which doesn't need deep review as they are just existing code moved around?
Yes, if we're happy with this new configuration method then I'll make another PR with that and rebase this one |
RUN go build -tags=icicle -o ./bin/server ./cmd/encoder | ||
|
||
# Start a new stage for the base image | ||
FROM nvidia/cuda:12.2.2-base-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image size comparison:
ghcr.io/layr-labs/eigenda/encoder latest 781c3866c5dd 6 hours ago 41MB
6129846e8150 6 hours ago 41MB
ghcr.io/layr-labs/eigenda/encoder-icicle latest f530fe9c250d 21 hours ago 760MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what makes it so large? near 20x is a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look into but my guess right now would be the cuda base image which contains the cuda runtime.
encoding/kzg/prover/noicicle.go
Outdated
|
||
func CreateIcicleBackendProver(p *Prover, params encoding.EncodingParams, fs *fft.FFTSettings, ks *kzg.KZGSettings) (*ParametrizedProver, error) { | ||
// Not supported | ||
return nil, fmt.Errorf("icicle backend called without icicle build tag") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of noicicle.go file is that the icicle backend should only be functional when the icicle build tag is used. This is so we don't need to include the icicle backend files in the traditional encoder backend, otherwise it may break the existing build.
} | ||
|
||
// CommitmentDevice represents a backend capable of computing various KZG commitments. | ||
type KzgCommitmentsBackend interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved commitments to its own backend in case we would like to accelerate them in the future.
@@ -331,3 +457,99 @@ func toUint64Array(chunkIndices []encoding.ChunkNumber) []uint64 { | |||
} | |||
return res | |||
} | |||
|
|||
func (p *Prover) newProver(params encoding.EncodingParams) (*ParametrizedProver, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should be reviewed carefully
encoding/kzg/verifier/verifier.go
Outdated
) | ||
|
||
// VerifierOption defines a function that configures a Verifier | ||
type VerifierOption func(*Verifier) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go in it's own PR, it's applying the same options pattern refactor to verifier so not super relevant to GPU encoder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One argument I would give for this configuration pattern is that it provides a simpler constructor that is extensible and backwards compatible.
For example, with a config struct we would have to modify every client that calls the constructor when we add a new configuration value vs. this pattern you add a WithX
function in the library that sets the field and does whatever validation it needs to. Then clients can optionally update to use that new added configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a config struct we would have to modify every client that calls the constructor when we add a new configuration value
Not sure about this, if that new field is optional, the caller doesn't need to do anything. If it's required, callers will have to make changes in either case.
My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).
Also it looks it still need to operate on config struct, if the validation runs across multiple fields (eg. invariants defined over multiple fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).
These seem like tractable problems though, here is an example in grpc library:
https://github.com/grpc/grpc-go/blob/66385b28b3fe71a4895f00d581ede0a344743a3f/dialoptions.go#L248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I agree with both of you. I've grown fond of the options pattern and like it best, but I still think consistency across a codebase is the most important thing, so kind of agree with jian about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'll revert to the previous method
"github.com/consensys/gnark-crypto/ecc/bn254/fr" | ||
) | ||
|
||
type ParametrizedEncoder struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor to make the encoder parametrized based on FFT settings (similar to Prover). Don't think it's too valuable since max time is ~600ms for loading largest fft setting. Am thinking of reverting
@@ -0,0 +1,103 @@ | |||
//go:build icicle | |||
|
|||
package icicle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This folder is icicle backend related setup
@@ -0,0 +1,46 @@ | |||
//go:build icicle | |||
|
|||
package icicle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This folder is icicle related computations for MultiProofs
encoding/rs/icicle/extend_poly.go
Outdated
"github.com/ingonyama-zk/icicle/v3/wrappers/golang/runtime" | ||
) | ||
|
||
type RsIcicleComputeDevice struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is icicle related computations for RS Encode
One thing I'm not too sure how to deal with is testing Icicle/ GPU encoder correctness. The only strategy I can think of is to compare output of default backend and icicle backend. Also if we wanted it to be part of CI, then we need to acquire GPU based github action runners. |
|
||
func CreateIcicleBackendEncoder(e *Encoder, params encoding.EncodingParams, fs *fft.FFTSettings) (*ParametrizedEncoder, error) { | ||
icicleDevice, err := icicle.NewIcicleDevice(icicle.IcicleDeviceConfig{ | ||
GPUEnable: e.Config.GPUEnable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks Icicle will support also CPU now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but it's not as performant as the gnark backend, am trying to look into why
encoding/rs/icicle/extend_poly.go
Outdated
|
||
var evals []fr.Element | ||
g.NttCfg.BatchSize = int32(1) | ||
runtime.RunOnDevice(&g.Device, func(args ...any) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the GPU is enabled, then this func will be executed on GPU by the runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because the device variable will be set to the GPU device here
eigenda/encoding/icicle/device_setup.go
Line 88 in dfe247d
Device: device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance, it looks the main data movement to/from GPU will be the coefficients and the resulting evaluations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RSEncode yes:
For multiproofs it is also the same (move coefficients, move out proofs) but in addition we need to also transfer SRS Table to compute the MSM. This can be done at SetupMSM at initialization but not sure how much speedup that is.
encoding/kzg/verifier/verifier.go
Outdated
) | ||
|
||
// VerifierOption defines a function that configures a Verifier | ||
type VerifierOption func(*Verifier) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?
Fs: fs, | ||
verbose: verbose, | ||
NumRSWorker: runtime.GOMAXPROCS(0), | ||
return &ParametrizedEncoder{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when you say "backend", it looks a software/library that computes RS encoding? I.e. it decoupled device (CPU/GPU) and backend (the software that executes compute on device)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I consider backend as library that implements the cryptographic primitive like NTT, MSM. We have our own implementation of NTT borrowed from protolambda and MSM from gnark.
It could be plausible that there exists libraries that implement the higher level primitives like multiproofs, commitment, RS encode and we can try to make our code more of a frontend that calls those. But for now we focus on the core primitives that give the speedup factor.
d716e07
to
51b0403
Compare
encoding/kzg/prover/commitments.go
Outdated
Srs *kzg.SRS | ||
G2Trailing []bn254.G2Affine | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to put a comment, describing e(ComputeCommitment, g2_shifted) = e(g1, ComputeLengthCommitment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have been
e(g1_shifted, ComputeLengthCommitment) = e(g1, ComputeLengthProof)
e(ComputeCommitment, g2) = e(g1, ComputeLengthCommitment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to do documentation update in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation, I think this one appears to be quite readable: https://github.com/crate-crypto/go-kzg-4844/blob/master/internal/kzg/kzg_prove.go, which we can take some inspiration from
@@ -162,7 +162,8 @@ func RunDisperserServer(ctx *cli.Context) error { | |||
bucketName := config.BlobstoreConfig.BucketName | |||
logger.Info("Blob store", "bucket", bucketName) | |||
if config.DisperserVersion == V2 { | |||
prover, err := prover.NewProver(&config.EncodingConfig, true) | |||
config.EncodingConfig.LoadG2Points = true | |||
prover, err := prover.NewProver(&config.EncodingConfig, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first arg is kzgConfig, according to the func. Name issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's acceptable
return nil, nil, fmt.Errorf("failed to create SRS table: %w", err) | ||
} | ||
|
||
fftPoints, err := subTable.GetSubTables(params.NumChunks, params.ChunkLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever need fftPoints, chance for optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will include this in followup optimizations
jobChan := make(chan JobRequest, numWorker) | ||
results := make(chan error, numWorker) | ||
|
||
for w := uint64(0); w < numWorker; w++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try workerpool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will include in followup. Wrote ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG mostly, a bunch of small comments
RUN go build -tags=icicle -o ./bin/server ./cmd/encoder | ||
|
||
# Start a new stage for the base image | ||
FROM nvidia/cuda:12.2.2-base-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what makes it so large? near 20x is a lot
encodingConfig := &encoding.Config{ | ||
BackendType: backendType, | ||
GPUEnable: config.ServerConfig.GPUEnable, | ||
NumWorker: config.EncoderConfig.NumWorker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have understanding about how this parallelism interacts with the blob level parallelism at the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding the interaction is that it can introduce CPU contention if we are handling more than 1 blob concurrently. In the worst case we set NumWorkers = # CPU on machine and two blobs are doing computations that require all workers.
} | ||
|
||
// Setup MSM if parameters are provided | ||
if config.FFTPointsT != nil && config.SRSG1 != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be validated (right after entering NewIcicleDevice) for properties of SRSG1 and FFPointsT? Or they are validated at the caller?
device := runtime.CreateDevice("CPU", 0) | ||
if runtime.IsDeviceAvailable(&device) { | ||
slog.Info("CPU device available, setting device") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume the CreateDevice never fails for CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's the case but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be safer to return an error and bubble up
} | ||
|
||
slog.Info("CUDA device not available, falling back to CPU") | ||
return setupCPUDevice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPUEnable
is instructed by the cmdline flag, should this not fallback? Or comment in the flag that it'll fallback if there is no GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment to flag
@@ -13,66 +14,18 @@ import ( | |||
"github.com/consensys/gnark-crypto/ecc/bn254/fr" | |||
) | |||
|
|||
type KzgCpuProofDevice struct { | |||
type KzgMultiProofGnarkBackend struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we create a gnark
folder similar to icicle
to organize these implementations with gnark?
|
||
sumVec, err := p.MsmBatchOnDevice(flattenStoreCopyToDevice, p.FlatFFTPointsT, int(numPoly)*int(dimE)*2) | ||
if err != nil { | ||
icicleErr = fmt.Errorf("msm error: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below: why not return on such an error? The proof computing is already failed at these points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially just makes it so we don't have to compile icicle libraries unless we explicitly add the build tag.
encoding/kzg/prover/prover.go
Outdated
|
||
func (p *Prover) createGnarkBackendProver(params encoding.EncodingParams, fs *fft.FFTSettings, ks *kzg.KZGSettings) (*ParametrizedProver, error) { | ||
if p.Config.GPUEnable { | ||
return nil, fmt.Errorf("GPU is not supported in default backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "default" -> "gnark" (some occurrences below as well)
@@ -14,17 +14,21 @@ import ( | |||
) | |||
|
|||
func TestBatchEquivalence(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be nice to have a icicle-gnark equivalence test for proofs, but may need gpu setup for unit test... Maybe we can start with CPU (for both), in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lg. There were a couple comments before would be nice to address
device := runtime.CreateDevice("CPU", 0) | ||
if runtime.IsDeviceAvailable(&device) { | ||
slog.Info("CPU device available, setting device") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be safer to return an error and bubble up
preprocessDone := time.Now() | ||
|
||
// Prepare data before GPU operations | ||
flattenCoeffStoreFr := make([]fr.Element, len(coeffStore)*len(coeffStore[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The computeCoeffStore
may create a 1D result directly then this flattening can be saved
|
||
// Proof device represents a backend capable of computing KZG multiproofs. | ||
type KzgMultiProofsBackend interface { | ||
ComputeMultiFrameProof(blobFr []fr.Element, numChunks, chunkLen, numWorker uint64) ([]bn254.G1Affine, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the blobFr
are points on the blob's polynomial, but why doesn't this produce a single bn254.G1Affine (which is the proof)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blobFr are the coefficients of the polynomial. We actually have some documentation about it here https://github.com/Layr-Labs/eigenda/blob/master/docs/spec/attestation/amortized-proving.md#amortized-multireveal-proof-generation-with-the-fft.
These two resources were good for understanding the implementation details (although I still don't fully grasp it yet):
https://alinush.github.io/2021/06/17/Feist-Khovratovich-technique-for-computing-KZG-proofs-fast.html
https://alinush.github.io/2020/03/19/multiplying-a-vector-by-a-toeplitz-matrix.html
// Encoding Reed Solomon using FFT | ||
func (g *RsIcicleBackend) ExtendPolyEval(coeffs []fr.Element) ([]fr.Element, error) { | ||
// Lock the GPU for operations | ||
g.GpuLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not using GPU (which is possible for icicle), then wouldn't this lock hurt performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf is already quite poor on the CPU version of icicle. I would prefer to tackle that separately if that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, well done!
Why are these changes needed?
This PR adds support for multiple backends for the encoder. Initially we only have the
gnark
andicicle
backends, see https://dev.ingonyama.com/icicle/overview for more information on the icicle backend.A backend may include GPU acceleration as an option, see https://dev.ingonyama.com/icicle/install_cuda_backend for icicle's GPU accelerated backend.
In order to use the icicle backend it must be compiled with the
icicle
build tag:Checks